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Abstract 

In this paper, we clarify the relations between the existing sets of regularity conditions for 
convergence rates of nonparametric indirect regression (NPIR) and nonparametric instrumental 
variables (NPIV) regression models. We establish minimax risk lower bounds in mean integrated 
squared error loss for the NPIR and the NPIV models under two basic regularity conditions that 
allow for both mildly ill-posed and severely ill-posed cases. We show that both a simple projection 
estimator for the NPIR model, and a sieve minimum distance estimator for the NPIV model, 
can achieve the minimax risk lower bounds, and are rate-optimal uniformly over a large class of 
structure functions, allowing for mildly ill-posed and severely ill-posed cases. 
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1 Introduction 



Recently there is a growing interest in estimation for nonparametric instrumental variables (NPIV) 
regression models, see e.g., Newey and Powell (2003), Darolles, Florens and Renault (2002), Hall and 
Horowitz (2005), Blundell, Chen and Kristensen (2007), Gagliardini and Scaillet (2006), to name 
only a few. The estimators proposed in these papers belong to three broad classes: (1) the finite 
dimensional sieve minimum distance estimator (Newey and Powell (2003), Ai and Chen (2003) 
and Blundell, Chen and Kristensen (2007)); (2) the infinite dimensional kernel based Tikhonov 
regularized estimator (Darolles, Florens and Renault (2002), Hall and Horowitz (2005), Gagliardini 
and Scaillet (2006)); and (3) the finite dimensional orthogonal series Tikhonov regularized estimator 
(Hall and Horowitz (2005)). Each of these papers presents different sets of sufficient conditions for 
consistency and convergence rates of its proposed estimators. In addition, for the mildly ill-posed 
case (when the singular values associated with the conditional expectation operator decay to zero 
at a polynomial rate), Hall and Horowitz (2005) establish the minimax risk lower bound in mean 
integrated squared error loss for the NPIV regression model under a set of regularity conditions that 
are related to their estimation procedures. They also show that their proposed estimators achieve 
this lower bound; hence their rate is optimal for the class of structure functions they consider. 

To the best of our knowledge, there is no published work that discuss the relations among the 
different sets of sufficient conditions imposed in these various papers. Therefore, it is unclear whether 
the minimax risk lower bound derived in Hall and Horowitz (2005) is still the lower bound under 
regularity conditions stated in the other papers. It is also unclear whether the estimators proposed 
in the other papers are rate optimal in a minimax framework corresponding to the conditions stated 
in these papers. Moreover, when the NPIV problem is severely ill-posed (for instance, when the 
singular value associated with the conditional expectation operator decays to zero at an exponential 
rate), there are no published results on minimax rates. 

In this paper, we address these issues based on a general formulation of the problems. In Section 
2, we first present the NPIV models. We then provide two basic regularity conditions: the approx- 
imation and the link conditions. The approximation condition is about the complexity of the class 
of the structural functions, which is measured as the best finite dimensional linear approximation 
error rate in terms of a basis expansion that may not be the eigenfunction basis of the conditional 
expectation operator. The link condition is about the relative smoothness of the conditional expec- 
tation operator in terms of the basis used in the first condition. We show that these two regularity 
conditions are natural generalizations of, and are automatically satisfied by, the so-called "general 
source condition" , an assumption commonly imposed in the literature on ill-posed inverse problems. 
Our two basic regularity conditions are also implied by the ones assumed in the literature on NPIV 
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models, such as those imposed in Darolles, Florens and Renault (2002), Hall and Horowitz (2005), 
and Blunder!, Chen and Kristensen (2007). In Section 3, we first show that the NPIV model is no 
more informative than the reduced form nonparametric indirect regression (NPIR) model (actually 
the model assuming a known conditional expectation operator of the endogenous regressor given 
the instrumental variables). Under the two basic regularity conditions stated in Section 2, we derive 
the minimax risk lower bound in mean integrated squared error loss for the NPIR and the NPIV 
models, allowing for both the mildly ill-posed case and the severely ill-posed case. In Section 4, 
we present a simple projection estimator for the NPIR models, and establish that it achieves the 
lower bounds and hence is rate-optimal in the minimax sense. When restricting our conditions to 
various special cases, including the nonparametric mean regression models and the NPIR models 
under general source conditions, our results reproduce the existing known minimax optimal rates 
for these special cases. But more importantly, our minimax optimal rate results cover many new 
cases as long as their model specifications satisfy the approximation and the link conditions. We 
also discuss what could happen if the link condition on the relative smoothness of the conditional 
expectation operator is not satisfied. In Section 5, we show that the sieve minimum distance (SMD) 
estimator for the NPIV models is rate-optimal in the minimax sense. In fact, we show that both 
the projection estimator for the NPIR models and the SMD estimator for the NPIV models are 
rate-optimal uniformly over a large class of structure functions, allowing for arbitrarily decaying 
speed of the singular values of the conditional expectation operator. Section 6 provides some further 
discussions on the regularity conditions. Section 7 briefly concludes, and all the proofs are gathered 
in the Appendix. 

Before we conclude this introduction, we mention closely related work in more abstract settings of 
linear ill-posed inverse problems. First, there exist many papers and some monographs devoted to 
constructing estimators and deriving optimal convergence rates in the deterministic noise framework 
with a known operator (or a known operator up to a deterministically perturbed error with a 
specified error rate). See, e.g., Engl, Hanke and Neubauer (1996), Nair, Pereverzev and Tautenhahn 
(2005) and the references therein. Second, there are also many results on minimax optimal rates in 
mean integrated squared error loss in the random white noise framework with a known operator; 
see, e.g., Cohen, Hoffmann and Reifi (2004), Bissantz, Hohage, Munk and Ruymgaart (2007) and 
the references therein. Third, there are a few recent papers on constructing estimators that achieve 
optimal convergence rates in the presence of a white noise and an unknown operator, but assuming 
the existence of an estimator of the operator with a rate. See, e.g., Efromovich and Koltchinskii 
(2001) and Hoffmann and Reifi (2007). 
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2 NPIV models and basic regularity conditions 



We first specify the NPIV regression model as 

Y i = ho{X i ) + U i , E[Ui\Wi} = 0, i = l,...,n, (2.1) 

with observations {(Xi,Yi,Wi)}" =l , a random sample from the unknown joint distribution of 
(X,Y,W). Here Y is a scalar dependent variable, X is a vector of endogenous regressors in M. d 
and W is a vector of instrumental variables in M. d that satisfy the property E[U \ W] = 0. (For the 
ease of presentation we assume that X and W do not contain any common variables, and the con- 
ditional density of X given W is well-defined) . The parameter of interest is the unknown structure 
function ho{»), while the joint law J^uwx of (U,W,X) is an unknown nuisance function. 

Let us introduce the Hilbert spaces 

L\ = {h : R d -»■ M | \\hf x := E[h 2 (X)} < oo}, 
L 2 W = {g : R d R \ \\g\\ 2 w := E[g 2 {W)\ < oo}. 

Since the conditional distribution of X given W is unspecified, the conditional expectation operator 

(Kh)(w) := E[h(X) \ W = w] 

is unknown, except that it is an integral operator mapping from L 2 X to L 2 ^. This operator is the key 
in the construction of estimators of ho because by conditioning on W in (|2.1|) and using E[{7 [ W] = 
we obtain 

E[Y | VF] = E[/i (X) [ W] + E[?7 | W] = Kh (W). 

Consequently, by regressing Y on W, estimating K and using this relationship we can hope to 
retrieve an estimator of ho. 

Let J4? denote a subset of L 2 X and assume ho € Jtif . Here captures all the prior information 
(such as the smoothness and/or shape properties) about the unknown structure function ho- To 
ensure that there is a unique solution ho € J^f for the NPIV model (|2.ip . in this paper we assume 
that the operator K satisfies the following restriction: 

{h G Jt : Kh = 0} = {0}. (2.2) 

Depending on the choice of the function class J^ 5 , the identification condition (12. 2D imposes different 
restrictions on the operator K (or equivalently, on the conditional density of X given W). For 
example, if ffl = L x , then condition (|2.2p becomes the standard identification condition that K 
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is injective, i.e., .JV{K) := {h £ : K/i = 0} = {0}, (or equivalently, the conditional density 
of X given W is complete); see, e.g., Newey and Powell (2003), Darolles, Florens and Renault 
(2002), Carrasco, Florens and Renault (2007). If Jtf? = {h £ L 2 X : sup^ ^ 1}, then condition 

(|2.2p corresponds to assume that the conditional density of X given W is bounded complete; 
see, e.g., Chernozhukov and Hansen (2005), Chernozhukov, Imbens and Newey (2007), Blunder!, 
Chen and Kristensen (2007). For additional results on identification in semi/nonparametric models 
with endogeneity, see, e.g., Blundell and Powell (2003), Florens (2003), Florens, Johannes and Van 
Bellegem (2007) and the references therein. 

2.1 Basic regularity conditions 

In this paper we would like to establish a minimax risk lower bound for the NPIV model, that is, 
we would like to derive a result of the form: there are a finite constant c > and a rate function 
5 n J. as n j oo such that 



where the infimum is over all possible estimators h n for h £ M '. Note that a NPIV model (|2,ip 
is completely specified by prescribing the joint law Jifuwx of (U, W, X) and the structure function 
h. This lower bound 5 n will be valid for quite general forms of Jzfjjwx, independently of knowing 
or not knowing it. In particular, although the mean squared error loss and the class of structure 
functions will be defined in terms of the distribution of X, there is no need to assume any 
explicit properties of this distribution to derive a minimax lower bound. 

We would also like to present some particular estimators that attain the lower bound rate S n . 
However, before we could establish any minimax lower and upper bounds, it is clear that we have to 
impose some conditions on the class of structure functions and on the conditional expectation 
operator K. In this paper, we implicitly assume that the prior information about Jtf already 
includes some regularity properties that could be described in terms of a Hilbert scale generated 
by a conveniently chosen (by the researcher) operator B. The regularizing action of the conditional 
expectation operator K would also be described as some smoothness relative to the known operator 
B. Formally, let B : Dom(B) C L x — ► L\ be a densely defined self-adjoint, strictly positive definite, 
and unbounded operator (such as differential operators with boundary constraints). For the ease 
of presentation we assume that B has eigenvalues v\. \ oo with corresponding L^-normalized 
eigenfunctions {u^} which then form an orthonormal basis of L x . For non-discrete spectrum our 
results will still hold, but the presentation would become more technical, using spectral measures 
and abstract functional calculus. 
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Throughout this paper we denote by Jf?(r,R) a subset of Jt? C L 2 X , and assume the following: 

2.1 Assumption (approximation condition). There are finite constants r,R > such that J^(r, R) 
consists of functions h satisfying 

m 

inf \\h - y^afc^fcHx < R 2 ^m+i for all mEN. (2.3) 

Note that the left-hand side of (|2.3p gives the error in approximating h optimally by an element of 
the m-dimensional space spanned by the basis functions {u±, . . . , u m }. So, Assumption 12.11 char acter- 
izes the regularity (or smoothness) of the structure functions in J^f(r, R) by the L 2 X - approximation 
error rates when they are approximated by the basis {u^} associated with B. Clearly, Assumption 
12.11 will give a bound on the bias and implies that Jf?(r,R) is a compact set in L 2 X . For many 
typical smooth function classes and basis functions like the Fourier basis, wavelets or splines the 
approximation error rates are well known. 

For any s > and h € Dom(B s ) C L 2 X we write \\h\\ s := Let H s denote the completion 

of Dom{B s ) under the norm ||*|| s . {H s } s> q is called a Hilbert scale generated by B (see, e.g., Engl, 
Hanke and Neubauer (1996) for its detailed properties). For any finite constants r, R > 0, we define 
a Sobolev-type ellipsoid as H r R := {h € H r , \\h\\ r ^ R}. Since 

oo oo 

H R = {h = J2( h ^k)xu k , \\h\\ 2 r =Y, u k r (h,u k ) 2 x < R 2 }, 
fc=l fe=l 

it is clear that H r R is a subset of J4?(r,R). It is also easy to see that the following hyperrectangle 
Q r R , in l? x is a subset of Jf?(r, R) for R' > sufficiently small: 

oo 

®R> '■= { h = ^2(h,u k )xUk, \(h,Uk)x\ < R'^k 13 }' P = r + \> \- 
k=i 

Let us now formulate the mapping properties of the conditional expectation operator K in terms 
of the (generalized) Hilbert scale generated by B. 

2.2 Assumption (link condition). There are a continuous increasing function tp : M + — > M + and 
a constant M > such that \\Kh\\ w < M\\[ip(B~ 2 )]y 2 h\\x for all he L\. 

Assumption 12.21 is in fact equivalent to the range inclusion condition: 

ran(|^|) C ran([ ¥ p( J B- 2 )] 1 / 2 ) with \K\ := (K*K) 1 / 2 , 

where K* denotes the Hilbert space (L x ) adjoint of K. For the NPIV models, under mild condi- 
tions, the conditional expectation operator K is a compact operator. Thus the self-adjoint compact 
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operator K*K has the eigenvalue-eigenfunction decomposition {X k ,e k }, where the eigenvalues are 
arranged in non-increasing order: A^ ^ X k +i ^ ... > and A/% tends to zero as k "f oo. Then 
Assumption 12.21 can be equivalently restated in terms of two possibly different orthonormal bases 
{e k } and {u k } of L\: 

oo oo 

Y,h(h,e k ) 2 x ^M 2 Y,^k 2 )(h,u k ) 2 x for all /i G L 2 X . (2.4) 
fc=i fc=i 

2.3 Remark. We can rewrite Assumptions \2. 1\ and \2.2\ without specifying the operator B explicitly. 
All we require are the existence of an orthonormal basis {u k } in L 2 X and a sequence of increasing 
positive real numbers {v k } such that equations \2. 3\) and \2.J$ hold. In fact, we can then construct 
the self-adjoint unbounded operator B according to 

oo 

Bh = ^2is k (h,u k ) x u k , 

k=l 

with Dom(B) = {h G h\ : Y%Li v l(K u k ) 2 x < oo}. 

2.4 Example. Suppose that X is uniformly distributed on the interval [0, 1] and let Bf{x) := 
—f"(x) for all f G L 2 ([0, 1]) with f" G L 2 ([0, 1]) and with periodic boundary conditions. Then B 
has (complex-valued) eigenf unctions u k (x) = ex.p(2irkix) with eigenvalues v k = (2irk) 2 such that 

iT = {/GL 2 ([0,l]) : ]r^Ki>*>| 2 <°o} 

feez 

is the classical L 2 -Sobolev space H 2 l r of regularity (smoothness) 2r with periodic boundary con- 
ditions. See, e.g., Edmunds and Evans (1987) for many examples of generating smooth function 
spaces from differential operators. 

For the typical choice <p(t) = t a for some a > 0, Assumption \2.2\ translates to \\Kh\\w ?S 
M||f? -a /t||x, which means intuitively that the operator K regularizes at least as much as B~ a . 
In the case Bf(x) := —f"(x) the operator K acts like integrating at least (2a) -times, i.e. maps L 2 
to the L? -Sobolev space of regularity 2a. 

In the statistics literature, for the standard nonparametric mean regression model (i.e., the model in 
which K is the identity operator), the minimax risk lower and upper bounds have been established 
in mean integrated squared error loss for various classes of functions Jrf? such as a Sobolev ball 
(ellipsoid), a Holder ball (hyperrectangle) or a Besov ball (ellipsoid or hyperrectangle or Besov 
body); see, e.g., Donoho, Liu and MacGibbon (1990), Yang and Barron (1999) and the references 
therein. As shown in these papers, what matters for minimax risk lower and upper bounds for 
nonparametric mean regression estimation is the complexity of the class of functions that can be 
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measured in terms of best finite dimensional approximation numbers. This motivates us to impose 
Assumption 12, 1[ However, since the basis {uk} (of the operator B) used to construct the best finite 
dimensional approximations for the class of functions JriP may differ from the eigenfunction basis 
{e/;} (of the operator K*K), we have to impose Assumption 12.21 to link these two. 

We shall refer to Assumptions 12.11 and 12.21 as the two basis regularity conditions; and sometimes 
call Assumption 12.11 the approximation condition and Assumption 12.21 the link condition. Both 
assumptions are satisfied by the ones imposed in the literature, such as those in Cohen, Hoffmann 
and Reifi (2004), Efromovich and Koltchinskii (2001), Hoffmann and Reifi (2007), Blundell, Chen 
and Kristensen (2007), Chen and Pouzo (2007) and others. In the next subsection we show that 
these two basic regularity conditions are automatically satisfied by the so-called "general source 
condition", which in turn are satisfied by conditions imposed in Hall and Horowitz (2005) and all 
the other papers using the general source condition. 

2.2 Relation to source conditions 

In the numerical analysis literature on ill-posed inverse problems it is common to measure the 
smoothness (regularity) of the function class ,J%? according to the spectral representation of the 
operator K*K. Denote by \\K\\ := sup/ r /( x .^i the operator norm. The so-called "general 

source condition" assumes that there is a continuous function ip defined on [0, ||-f^|| 2 ] with ^(0) = 
and A" 1 /V(A) non-decreasing such that 



Neubauer (1996)). If K*K is compact with eigenvalue-eigenfunction system {\k,ek}, then (|2.5p is 
equivalent to 



Therefore the general source condition implies our Assumptions 12 . 11 and 1 2 . 21 by setting = e& and 
u ~ r = ^(A fc ) for all k > 1, and ip(B~ 2 ) = K*K. 

In the econometrics literature on NPIV estimation, Darolles, Florens and Renault (2002) impose 
a smoothness condition on the true structure function ho that is closely related to the source 
condition. Precisely, they assume ho G J%dfr, where 



•^source '■= |/i = ip(K*K)g, g 6 L 2 X , \\g\\ 2 x ^ Rj, for a finite constant R, (2-5) 
and the original "source condition" corresponds to the choice = A 1 / 2 (see Engl, Hanke and 





for some a ^ 1. 



(2.6) 
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Darolles, Florens and Renault (2002) use this assumption ho £ J%dfr to establish the convergence 
rate of their kernel-based Tikhonov regularized estimator in mean squared error metric E/j [||/i — 
foollxl- This rate, however, will not hold uniformly over ho £ J4?dfr, since the series in (|2.6p is not 
uniformly bounded away from infinity, which is the role of R £ (0, oo) in the definition of J4? SO urce- 

Hall and Horowitz (2005) assume that ho belongs to a hyperrectangle in L 2 X , using the eigenfunc- 
tions {efc} of the operator K*K as a basis: 

oo 

JfaH = {h = J2( h > e k)xe k , \(h,e k ) x \^R'k-P}, (2.7) 
fc=i 

which, when (5 > 1/2 plays the role of r + 1/2, implies our Assumptions I2TT1 and I2T21 by setting 
u k = e ki v k = k for all k ^ 1, and (p(B~ 2 ) = K*K. In addition, Hall and Horowitz (2005) also 
assume that the eigenvalues of the operator K*K are such that const. k~ a for some 

a > 1 and 2/3 > a ^ j3 — ^, which suggests that we could set cp(t) = t a / 2 . 



3 The lower bound 



In this section we shall establish a minimax risk lower bound for the NPIV model under the two basic 
regularity conditions stated in Section 2. We derive this result by first establishing that the NPIV 
model is no more informative than the reduced form nonparametric indirect regression (NPIR) 
model. First, the following abstract assumption ensures a certain complexity of the statistical 
NPIV model and permits the residuals of Y given W to be Gaussian. Recall that ££z denotes the 
law of the random vector Z . 

3.1 Assumption. Let oo > be a finite constant. Let ^ be a (possibly very large) set of elements 
(^UWXjh) such that the following property holds: 

• For all h £ Jif, there is a law J^uwx with (Jifuwx, h) £ 'if such that J??wy is determined by 
■^uwx and h, and that 

Vi := Y, - E[Yi | Wi) = h{Xi) - (Kh)(Wi) + U t 

given Wi is N(0, a 2 (Wi))- distributed with <t 2 (W) ^ erg. 

3.2 Example. A typical NPIV model $2.1\) satisfying Assumption \3. 1\ is generated by taking Wi 
from an arbitrary probability distribution J^w, then generating Xj according to a conditional density 
of X given W, generating Vi according to N(0,a 2 (Wi)), and defining 

Ui := {Kh){Wi)-h{Xi) + Vi. 
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3.1 Reduction from NPIV model to NPIR model 



For each NPIV model, we specify the reduced form NPIR model as 



Y l = (Kh)(W l ) + V i , i = l,... 



n, 



with (Wi,Vi) i.i.d., Jfy\yy =w = -^(0, a 2 (w)), h £ Jt? the unknown structure function, and K 
a known injective operator from L\ to L^,. The observations corresponding to the NPIR are 
{(Yi, Wi)}™ = i- We shall now formally prove, that the NPIV model is statistically more demanding 
than an indirect regression model with known operator K. We compare statistical experiments in 
a decision-theoretic sense (see Le Cam and Yang (2000)), and therefore, have to ensure first that 
the classes of parameters are compatible. 

3.3 Definition. Let Assumption \3. 1\ hold. The NPIR model class consists of all model param- 
eters [S£w'i h) such that there is {J^iiWXi^) £ with the following properties: ££yy = ££w' i 
a 2 (w) ^ c7q > 0, the conditional law 5?x\W ^ s prescribed according to K, and ^u\wx ^ s arbitrary 
among the conditions imposed in ^ . 

3.4 Lemma. The NPIR model is more informative than the NPIV model in the sense that for 
each estimator h n for the NPIV model there is an estimator h n for the NPIR model with 



3.2 The lower bound 

We now formally present the minimax risk lower bound for the NPIR and the NPIV models in mean 
squared error loss. We establish the lower bound by considering asymptotically least favorable Bayes 
priors, more specifically, by applying Assouad's cube technique; see e.g. Korostelev and Tsybakov 
(1993) or Yang and Barron (1999). In this paper we use the notation a n X b n to mean that there 
is a finite positive constant c such that ca n ^ b n ^ c~ l a n . 

Since sup^g^,,^ E/J||/i n — h\\ 2 x ] ^ sup/j £ #r E^[||/i n — h\\ x ], it suffices to establish the lower bound 
for functions in H r R , a subset of Jt?(r,R). 

3.5 Theorem. Let Assumptions \2.1\ and \2.2\ hold. For the NPIR model we have the following 
minimax risk lower bound: 




sup ^(Sf w ,a(»),h)[\\hn-h\\x}^ sup E { ^ UWXth) [\\h n - h\\ 2 x }. 



in 
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where the infimwm runs over all possible estimators h n based on n observations, and m is the largest 
possible integer satisfying 

m 
k=l 

(1) Mildly ill-posed case: Let tp(t) = t a and x k e for some a, e > 0. If m X n i/(2re+2ae+i)^ ^ en 

S X n -2r/(2r+2a+e" 1 )_ 

(2) Severely ill-posed case: Let ip(t) = exp(— t~ a / 2 ), f& x k e for some a,e > 0. If m = clog(n) 1 / a<E 
with a sufficiently small c > 0, then 5 n x (logre)~ 2r / a . 

The next corollary follows directly from Lemma 13.41 and Theorem I3.5| hence we omit its proof. 

3.6 Corollary. Let Assumptions \2.l\ \2.2\ and \3.1\ hold. For the NPIV model we have the same 
minimax risk lower bound: 

/v 2 

inf sup E h [\\h n - h\\ 2 x ] > j^hm S n, with 5 n given in Theorem\3J& 

where the infimum runs over all possible estimators h n based on n observations. 

3.7 Remark. For the proof of the lower bound we have to consider the likelihood between the 
observations. This is why we require Gaussianity. Nevertheless, the proof works the same for other 
error densities, but bounding the Kullback-Leibler or Hellinger distance between alternatives might 
be more cumbersome. 

Let us also mention that the proof strategy can also yield a lower bound for convergence in probability: 
inf sup P h (s-rK -hf x > ^) >c>0, with 5 n given ,n Theorem^ 

cf. Korostelev and Tsybakov (1993). 

Note that Assumption 2.2 is automatically satisfied under the general source condition with K*K = 
(p(B~ 2 ). Following the proof of Theorem 13.5^ we immediately obtain: 

3.8 Remark. Suppose that Assumption \2. 1\ is satisfied with h £ ^source and = e^, v7 r = ip(Xk) 
for all k ^ 1. Let ip(B~ 2 ) = K*K . Then, for NPIR model and for NPIV model (under Assumption 
\3.1\) . we have the same minimax risk lower bound: 

inf sup E h [\\h n - h\\ 2 x ] ^ 4ex p° 4M ) ^' with 5 n given in Theorem\T^ 

hn h(^ r J%fsource 
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where the infimum runs over all possible estimators h n based on n observations. An equivalent way 
to determine the lower bound 6 n is to choose the largest possible integer m such that 

m m 

8 n = n" 1 J2 A," 1 , o-ln- 1 J>(A*)]- V < R 2 . 

k=l k=l 

4 An upper bound for the NPIR model 

We prove an upper bound for the NPIR model. The aim of this section is to convince the reader 
that the lower bounds given in Section 3 are rate-optimal, and to provide an easy method to attain 
these rates. Again we assume that B has eigenvalues U}. f oo with corresponding L^-normalized 
eigenfunctions (u^) which then form an orthonormal basis of l? x . For m^lwe define our estimator 
as 

m I n 

hn-=Y,^k, m ■= - J2 Y i(( K *)~ l u k )(Wi). (4.1) 

fc=l i=l 

This simple projection procedure using the basis {uk} (of B) does not seem to have been studied 
before. It is a natural generalization of the well-known spectral cut-off method using the eigen- 
function basis {e^} of K*K. Given the prior information about Jf?(r,R), this is a mathematically 
satisfactory construction. 

For the upper bound we impose the following assumptions on the NPIR model. 

4.1 Assumption. 

(1) There is a finite o\ > such that a(w) ^ o\ for all w £ supp(Jz?yi / )/ 

(2) There is a finite S > such that \\Kh\loo = sup w£snpp ^y w ^\(Kh)(w)\ ^ S for all h € J$?(r,R). 

Assumption 14.11 is typically assumed in papers on nonparametric estimation of ill-posed indirect 
regression; see, e.g. Bissantz, Hohage, Munk and Ruymgaart (2007). When K is the identity opera- 
tor, Assumption 14. 1( 2) becomes to require that ||/i||oo ^ S for all h € J4?(r,R), which is a condition 
imposed in Yang and Barron (1999, theorems 6 and 7) to derive their minimax rate for a standard 
nonparametric regression model. 

4.2 Assumption (reverse link condition). There is a finite c > such that \\Kh\\w ^ 
c\MB- 2 )] l / 2 h\\ x for allheL 2 x . 

Assumption 14.21 is the reverse condition of Assumption 12.21 and is often imposed in papers on ill- 
posed inverse problems. We shall sometimes call Assumptions 12.21 and 14 . 21 together as the exact link 
(or exact range) condition. See Subsection 4.2 for a relaxation of this condition. 
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4.3 Proposition. For the NPIR models, suppose that Assumptions OOl \4-l\ and \4-2\ hold. Then 
the estimator h n defined in satisfies 

m 

sup E h [\\h n - h\\ x ] ^v- 2 ; i R 2 + 2n-\S 2 +al)c- 2 Y,\^k 2 T 1 - 

heJt?(r,R) k=1 

If m = m(n) is such that n~ l X^fcLi l/ | r [v 7 ( z/ ^ 2 )] 1 x 1j then, under A ssumption 1 2. S\. this estimator 
h n is rate-optimal in the minimax sense: there is a finite constant C > such that 

m 

sup E/j[||h n — h\\ 2 x ] ^ C^+i x n ~ l /J^^ 2 )] 1 x with 5 n given in Theorem ]^. 5\ 

he^(r,R) k=1 

(I) Mildly ill-posed case: Let tp(t) = t a and x k e for some a, e > 0. If m X n i/(2re+2ae+i)^ ^ en 

tf n Xn -2r/(2r+2a+ e -i)_ 

(%) Severely ill-posed case: Let ip(t) = exp(— t~ a / 2 ), z/& x A: e /or some a,e > 0. If m = clog(n) 1 / a<E 
wrai/i a sufficiently small c > 0, i/ien 5 n x (logn)~ 2r / a . 

4.4 Remark. ^4s i/ie proof reveals, the upper bound does not require that the errors are Gaussian, 
the existence of second moments suffices. 

4.5 Remark. When K is the identity operator, the NPIR model becomes the standard nonparamet- 
ric mean regression model, and Assumption \2.2\ is automatically satisfied with {p() being a constant, 
then Theorem \3.5\ and Proposition \4-3\ together reproduce the well-known minimax lower and up- 
per bounds for the nonparametric mean regression model (see, e.g., theorem 7 of Yang and Barron 
(1999)), in which 5 n x ^, and m is the largest possible integer satisfying v^+i ~ 

Comparing the minimax optimal rates in mean integrated squared error loss for the nonpara- 
metric mean regression model and for the NPIR model, we see the squared bias is of the same 
order (v"^), but the variance blow up from ^ for the nonparametric mean regression model to 
n- 1 EfcLi^ 1 ^ 2 )]" 1 for the NPIR model. 

Notice that the minimax optimal rate 5 n x (log n)~ 2r l a for the severely ill-posed case is independent 
of e (hence independent of the dimension d of X). For the mildly ill-posed case, when ip(t) = t a and 
Vf. x k e for some a > and e = 1/d, Theorem 13.51 and Proposition 14.31 together give the minimax 
optimal rate 5 n = n - 2r /( 2r + 2a + d ) for the NPIR models. This rate is well known for the special case 
when J^ir, R) is a o!-dimensional Sobolev ball H r R and the operator K is elliptic with ill-posedness 
degree a (i.e., ||-PC/i||v^ x \\B~ a h\\x for all h G L x ); see, e.g., Cohen, Hoffmann and ReiB (2004). 

Note that Assumptions 2.2 and 4.2 are automatically satisfied under the general source condition 
with K*K = ip(B~ 2 ). Applying Proposition 14.31 and Remark 13.81 we immediately obtain: 
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4.6 Remark. For the NPIR models, suppose that Assumption ^. 1\ holds, and Assumption \2. 1\ is 
satisfied with h € ^source and Uk = eu, v^, r = i]j(\k) for all k ^ 1. Let ip(B~ 2 ) = K*K . Then the 
estimator h n defined in j^. 1\ ) with Uk = et reaches the minimax rate uniformly over h E ^source •' 



sup E^fll^n — C5 n , with 5 n and m given in Remark ] 3. 8\ 



source 



In the literature on ill-posed inverse problems with known operator K, there are available many 
other estimation procedures (like Tikhonov's method) that employ source conditions; some of which 
lead to rate-optimal estimators only for mildly ill-posed case. See, e.g., Bissantz, Hohage, Munk 
and Ruymgaart (2007) and Florens, Johannes and Van Bellegem (2007) for recent results. 

4.1 Relaxation of the exact link condition 

For the minimax risk lower bound we impose Assumption 12.21 and for the upper bound we use 
Assumption 14.21 Together, these two assumptions require that the operator K satisfies 



This is a standard condition imposed even in books and papers on ill-posed inverse problems with 
deterministic errors; see, e.g., Engl, Hanke and Neubauer (1996), Nair, Pereverzev and Tautenhahn 
(2005) and the references therein. This condition usually holds when K acts exactly along certain 
function classes; see Section O for such an example. Moreover, this exact range condition 14.21 is 
automatically satisfied under the source condition with K*K = ip(B~ 2 ). However, Assumption 14.21 
may fail more generally. Luckily, this assumption is not strictly necessary. 

Let us indicate one possibility how Assumption 14.21 can be relaxed to requiring 

ran[ip(B^ 2 )] 1 ^ 2 C ran|if| + L, for some finite-dimensional linear space L. 

To keep it simple, we consider the case that the subspace L is spanned by one eigenfunction U£ of 
B with ui ^ ran|i^| and 1 ^ I ^ m. Then the simple estimator h n using f)e given in (|4.ip is no 
longer well defined, but we can consider for some v G L 2 ^ the estimator 



c\\[Lp(B- 2 )fl 2 h\ x ^ ll^lk ^ M\[ip(B- 2 )fl 2 h\ x for all h € h\ 



which is equivalent to 



ran 



(MB- 2 )] 1 / 2 ) 



Tsm(\K\). 



(4.2) 




i=l 
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Following the bias variance decomposition in the proof of Proposition 14.31 we obtain 

E[{fji - (h,u e ) x ) 2 } < ((Kh,v) w - (h lUl ) x ? + 2n~ l (S 2 + al)\\v\\ 2 w 

^(^K^-utfx+n^WvWlr. (4.3) 

The definition of H r R implies with some uniform constant C > 

sup E[(fj e -(h,u e ) x ) 2 } ^C(R 2 \\B- r {K%- Ul )\\\+n- l \\v\\ 2 w ). (4.4) 

h£H r R 

From inequality (14. 4p . it is easy to derive that this error in estimating the coefficient (h,ue)x is 
minimized by 

v = (KK* + n- l RT 2 B 2r y l Ku t , 

which is always well-defined. Consequently, in terms of minimax optimal rate over the class of 
functions H^, the rate in Proposition 14.31 does not deteriorate if we use fjg instead of fji and its 
error bound 

rr x \\{KK* + n^R^B^^KueWlv 

is not larger than the minimax optimal rate. See Section [6] for a concrete example. 

5 An upper bound for the NPIV model 

We now provide an upper bound for the NPIV model. For the NPIV model additional considerations 
due to the unknown conditional expectation operator are necessary. It is, of course, more complex 
to construct an estimator that is rate-optimal for the NPIV model than for the NPIR model, 
which is why the approaches in the literature are more diverse and require different additional 
assumptions. Here, we restrict ourselves to presenting a simple estimator to illustrate that it is 
possible to construct a rate-optimal estimator for the NPIV model in both mildly ill-posed and 
severely ill-posed cases based on the SMD estimator of Newey and Powell (2003), Ai and Chen 
(2003) and Blundell, Chen and Kristensen (2007). First, for each integer J ^ 1, we denote by 
spanjpi, ...,pj} a J-dimensional linear subspace of L^y that becomes dense in L 2 ^ as J — > oo. Let 
P Jn (w) = (pi(w), ...,pj n (w))' and P = (P Jn (W\), P Jn (W n ))' . We compute a sieve least squares 
estimator of E[Y - h(X)\W = .] as 

n 

E[Y - h{X)\W = .} = ^{Y t - h(X t )}P Jn (W t y(P'~P)- 1 P Jn (»). 

t=i 
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For each integer m ^ 1, we denote by Tim := spanj^i, ...,ip m } an m-dimensional linear subspace 
of L 2 X that becomes dense in L 2 X as m — > oo. Then we compute the SMD estimator of the true 
structure function ho as 

h n = aigmin fteWm(n)njr(p>Jl) - £ {t[Y - h(X)\W = W t ]} . (5.1) 

i=l 

Depending on the prior information about J4?(r,R), sometimes one may compute h n in closed 
form. For example, if J4?(r, R) = H r R and the density of X is bounded below and above by positive 
constants, then 

m 

h n { x ) = *kMx) = V>"»'n, (5.2) 
k=i 

n= ^ / p(p / p)- 1 p'^ + ac) ^(P'p^p'y, (5.3) 

with * = (^ m (Xi ),..., ip m (X n ))', y = (Yi,...,Y n )', the penalization matrix C = 
j{[B r ^ m {x)][B r i> m (x)]'}dx and A satisfies fi'Cn = R 2 . 

In addition to the assumptions on the NPIR models, we impose the following: 

5.1 Assumption. The basis {ipk} is a Riesz basis associated with the operator B, that is, 
Zr=i(h,i>k) 2 x x ET=l(h,u k ) x for all h E h\. 

Assumption 15.11 allows for the use of a Riesz basis {V^} instead of the ideal orthonormal basis 
{uk} to approximate the unknown structure function h G J4?(r,R) with the same order of the 
approximation errors. Of course in applications, we need some information about the tail behavior 
of the density of X before we can construct such a basis. For example, if we know that the density 
of X is bounded above and below by finite positive constants , then we could use wavelets as the 

{fa}- 

5.2 Assumption. 

(1) E[Y-U m (h(X))\W = .] belongs toA r c K (W) (Holder ball of regularity r K ) for any U m {h) eH m ; 

(2) (i) the smallest and the largest eigenvalues of K{P Jn (W)P Jn (W)'} are bounded and bounded 
away from zero for each J n ; (ii) P Jn (W) is a tensor product of either a cosine series or a B-spline 
basis of order % or a wavelet basis of order 7&, with % > r K > d/2; 

(3) the density of W is continuous and bounded away from zero over its support W , which is a 
compact connected subset in M. d with Lipschitz continuous boundaries and non-empty interior; 

(4) (i) J n — > oo and J 2 /n — > 0; (ii) lim n = c G (1, oo) and J n > m(n). 

Assumption 15.21 implies that the sieve least square estimate E[/i(X)|W = •] of E[/i(X)jVF = •] 
performs well; see e.g., Blundell, Chen and Kristensen (2007) for details. 
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5.3 Theorem. For the NPIV models, suppose that Assumptions PO, PO), \4l\ \4~J% [5J\ and UTB 
hold. Then the estimator h n defined in \5. 1\) satisfies 

\\K - h\\\ ^ Cmaxjz^, ^[^(z^ 2 )]" 1 } 

uniformly over h € Jif(r,R) except on an event whose probability tends to zero as n j oo. If 
m = m(n) is such that n~ l X^fcLi ^[vC^fc" 2 )] -1 x 1; ^ en estimator h n is rate-optimal in the 
minimax sense: there is a finite constant C > such that 

\\h n — h\\x ^ C^m+i x — ['/'(^m 2 )] 1 x <^"' ™^ 3roen in Theorem ] 3. 5\ 
uniformly over h G Jf?(r,R) except on an event whose probability tends to zero as n j oo. 
(1) Mildly ill-posed case: Let <p(t) = t a and >c k e for some a, e > 0. If m X n 1 /( 2re + 2ae + 1 ) } then 

5 n X n -2r/(2r+2a+e- 1 )_ 

($,) Severely ill-posed case: Let tp(t) = exp(— t~ a / 2 ), x fc e /or some a, e > 0. If m = clog(n) 1//a<E 
with a sufficiently small c > 0, i/ien <5 n x (logra)~ 2r//a . 

This minimax rate theorem appears to be new in the literature, and can be proved by slightly 
modifying the proof of Blundell, Chen and Kristensen (2007) for their theorem 2. Hall and Horowitz 
(2005) obtained minimax optimal rate supj l£ jif HH Kh[\\h n — h\\ x ] ^ C5 n for their estimators in the 
mildly ill-posed case for the class of functions ^hh defined in (|2.7p . Hoffmann and Reifi (2007) 
propose a wavelet estimator in the case of an unknown operator K that is elliptic with ill-posedness 
degree a. They assume there exists an estimator of K with specified rate, and their class of functions 
3?{r, R) is a Besov ball that could be bigger than the function class defined in our Assumption 12. 11 
but they do not consider severely ill-posed case. 

6 More on regularity conditions 

In this section, we use examples to discuss the pros and cons of the approach of imposing two 
basic regularity conditions (the approximation and the link conditions) versus the other approach 
of using the general source condition. To simplify the discussion, here we assume the operator K is 
known. In the first class of examples, the operator K has very smooth eigenfunction basis (in the 
sense that its eigenfunctions are many times differentiable) , while in the second class of examples, 
the operator K has eigenfunctions that are not differentiable. 
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6.1 Examples of K having infinitely times differentiable eigenfunctions 

Suppose that the {Wi}™ =1 are uniformly distributed on [0, 1] and K is a circular convolution operator 
on L 2 ([0, 1]): Kh(w) = k(x — w)h(x)dx with a 1-periodic function k that satisfies k(—x) = k(x) 
and has Fourier coefficients \,^k{m)\ = \Jq k(x) cos(mx) dx\ x (1 + |m|)~ a . Then K is a positive- 
definite self-adjoint operator which is diagonalized by the Fourier basis. 

Source condition: In this canonical case the exact link between K and B is easily established with 
B = K-\ tp(t) = t hence \\Kh\\ w = || [^(B- 2 )} 1 ' 2 ^^ for all h e L 2 ([0,1\). The smoothness 
of the unknown function h is also described using B = if , hence the Hilbert scale space 
H r (generated by B) is equal to the classical periodic Sobolev space H™ r of smoothness (or 
regularity) ra. Applying Remark l4.6l we obtain minimax optimal rate for this scale of periodic 
Sobolev spaces. 

Approximation + link conditions: Suppose {uk}k^i is an orthonormal basis of L 2 ([0, 1]) such 
that H-^ffll^Qo i]) x Yl'k 3 =i^ 2a (9^ u k) 2 - A typical example is given by sufficiently regular 
periodized wavelet bases (see Cohen, Daubechies and Vial (1993)). Then we can define 

Bg := ^2k(g,u k )u k , 

and the Hilbert scale spaces H r can be interpreted as approximation spaces for the basis 
{uk). In the convolution example we obtain ||i^(7||vy x ||-B _a g||x- Consequently, the exact 
link conditions (assumptions 12^21 and l4~2l) between K and B hold with ip(t) = t a . Applying 
Proposition 14.31 we obtain minimax optimal rate for the Hilbert scale space H r generated by 
B. 

The Hilbert scale of approximation spaces generated by B does not necessarily coincide with the 
Hilbert scale generated by K. The most pronounced example is the case a < 1/2, where all 
non-periodic wavelets on an interval still satisfy H-K^I^^q ^ x Sfeli k~ 2a {g, u^) 2 (see Cohen, 
Daubechies and Vial (1993)). Hence, the approximation spaces for unknown true structure func- 
tion need not exhibit any boundary condition. This means that a smooth, but non-periodic function 
on [0, 1] will have high regularity r in terms of the approximation space, while it is an element in 
periodic Sobolev spaces up to regularity 1/2 only. If we have in mind that our true function h is 
smooth, but not periodic, we should therefore rather choose the approximation space approach. 
On the other hand, wavelets work well just to some maximal regularity and they will therefore 
reconstruct very smooth and periodic functions not as well as the Fourier basis. 
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If K is more ill-posed, that is a ^ 1/2, we can adopt the ideas explained in Subsection 4.2. We 
remain in the approximation space framework and use non-periodic compactly supported wavelets 
as basis functions {u^}. Only the wavelets ip\ with support at the boundary are not in the periodic 
Sobolev spaces Hp er , s ^ 1/2. Using some (statistical) kernel function Lh : [—b, b] — > M of bandwidth 
b, we can consider the periodically smoothed version 

i>x(x) ■■= ipx{{x -y})L b {y)dy, see [0,1], 
J-b 

where {z} = z — \_z\ € [0, 1) denotes the fractional part of z G R. If L and ip\ are sufficiently often 
differentiable, then ip\ lies in the range Hp er of K. Using v = K~ l %l)\ in equation (|4.3p . standard 
kernel estimates (h 6 implies h € Hp er for all s ^ r and s < 1/2) show that for all h € H r R 
(with adapted notation) 

E[(fj x - (h, ip x )f] < Ci«h, V^a - V'a) 2 + n" 1 H^-^aII 2 ) ^ C 2 (6 2s + n~V 2a ). 

Optimizing over 6, we infer that (h,ifj\) can be estimated at rate n~ s ^ s+a \ which for r ^ 1/2 
is nearly n~ 1 ^ 2a+l \ Since in a wavelet approximation space of dimension 2 J only of the order J 
wavelets lie at the boundary, the rate in estimating h will be n -s /( s+a ) log(n) + n ^ 2r /( 2r + 2a + 1 ) i 
which for r ^ i + is roug 

hly n -i/(2a+l)_ If 

we had taken a method based on the source condition 
approach (like projection on eigenfunctions of K, or Tikhonov methods) the best achievable rate 
would have been roughly n~ 1 ^ 2a+2 \ 

6.2 Examples of K having non-differentiable eigenfunctions 

Depending on applications, it is perfectly conceivable that the eigenfunctions of K are rough while 
the basis functions Uk of B are smooth (or differentiable). For example, we can use the Haar 
basis i/>jk(x) = ip&x - k) on L 2 ([0,1]) (ip(x) = l [0;1/2] - l[i/ 2 ,i], 3 € N , k = 0, . . . , 2 j - 1, and 
= l[o,i]) an d define - somewhat artificially - in this Haar basis 

Then K is self-adjoint with eigenfunctions vpjk, which are step functions. For ar < 1/2, the Hilbert 
scale H r of K (or of the Harr basis) will be a Sobolev space, whereas for any ar ^ 1/2 this 
Hilbert scale H r will not be described in terms of traditional smoothness. Note that this H r will 
always contain piecewise constant jump functions. Nevertheless, the larger r the less complex is the 
function class ,J%f(r,R), that is the smaller the approximation error rate. As for the convolution 
operator we could instead define the function class J4?(r, R) in terms of a basis {uk} associated to 
B which is smoother and satisfies at the same time the link conditions of Assumptions 12.21 and 14.21 
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In conclusion, we see that rate-optimal methods may behave poorly if the function of interest, the 
structure function h, is not regular in the setting for which the method is designed. An important 
part of the specification of rate optimality is therefore always the associated function class. 

7 Perspectives 

In this paper, we clarify the relations between the existing sets of regularity conditions for conver- 
gence rates of NPIV regression models. We establish minimax risk lower bounds in mean squared 
error loss for the NPIV models under two basic regularity conditions that allow for both mildly 
ill-posed and severely ill-posed cases. We also show that the simple SMD estimator achieves the 
minimax risk lower bound, hence is rate-optimal for both mildly ill-posed and severely ill-posed 
cases. 

Many of the ideas in this paper can be easily adapted to treat other kinds of ill-posed inverse 
problems in econometrics. For instance, when the problem is mildly ill-posed, Horowitz and Lee 
(2007) show that their kernel based Tikhonov regularized estimator of nonparametric quantile 
instrumental variables (IV) regression reaches the minimax rate under conditions very similar to 
those imposed in Hall and Horowitz (2005) for NPIV regression. Similarly, one could show that the 
penalized SMD estimator proposed in Chen and Pouzo (2007) for nonlinear and possibly nonsmooth 
nonparametric conditional moment models is also rate-optimal, as their estimator achieves the 
minimax risk lower bounds established in our paper for the NPIV regression model. 

Once this is established, the intriguing open problem remains how to choose the regularization 
parameters adaptively from the data, not knowing the true regularity, and even to select among 
the different proposed procedures (e.g. generated by different operators B) in a data-driven way. 
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Appendix: Proofs 

Proof of Lemma \3.4\ Let h n = h n ({(Xi, Yi, Wj)}" =1 ) be an estimator for the NPIV model. 
Knowing the operator K amounts to knowing the conditional law of Xj given W%. Let 
us call the observations in the NPIR model {(V/, W/)}" =1 for some (J£w,a(»),h) € 
%j. We then generate artificially i.i.d. observations X[ according to the conditional law 
■^x\w=w with w = W[. Then the observations {{X[, Y(, Wl)}f =1 follow the law of some 
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(&uwx,h) E ¥ because Y( = h{X[) + U[ holds with XJ[ = (Kh)(W() - h(Xl) + V( satis- 
fying K[U- | W(] = and J^v'\W'=w = N(0,a 2 (w)). Consequently, the (randomized) estimator 
hn{{{Y(,W[)}2 =l ) := K({(Xi,Yf,W()}^ =1 ) has the same risk under (3f w , a(.), h) € % as h n 
has under {^u'W'X' , h) G and is thus not larger than the maximal risk over ^€ . □ 



Proof of Theorem \3.h\ We consider for i? = {&k) with G {—!,+!} and a sequence (7fc), to 



be specified below, the following functions in L 2 X : 

m 

h$ := ^2$k7kUk- 



fc=i 



The property ^ € i?]^ yields the following constraint on m and (7&) : 



\h 



fc=i 



For £ = 1, . . . ,m and each i? introduce i?^ by = for k ^ I and #i 



= tf fc for fe / ' — ' «« W 
because of the Gaussianity of the Vi given W{ the log-likelihood of f#(t) w.r.t. P.# is 



•Qe. Then 



log 



i=l 



a(Wi) 



Its expectation satisfies 



IE* 



log 



^ -2Ma^ 2 -ijn\\[ip{B- 2 )} l/2 u t \\ 2 



x 



2Ma 2 j 2 ri(p(L' e z ) —. fi£. 



In terms of the Kullback-Leibler divergence this means KL(P 1? (c) , P#) ^ —fie- More explicitly, 
we obtain by Markov's inequality 



1 n 



i=i 



2 7£ (k^)(^; 



-2 m 



Using the symmetry of the distribution of Vi given Wi, we infer by conditioning on (Wj)i^j^ n 



'dl 



dP,s 



> exp(2^)) = E# 



log 



dl 



1 

>2' 
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We calculate for each estimator h n : 



sup E h [\\h n - h\\ 2 x ] 

^ sup E#[||/i n - Mix] 

i?G{-l, + l} m 

m 

> 2 ~ m E E^^-^^i 

i?e{-i,+i} m fe=i 



fc=i i9e{-i,+i} m 

^ 1 d IP 

= ^ 2 - m ^ (h n -h0,u k ) 2 x + {h n -h^k),u k ) 2 x ~ 

k=i i?e{-i,+i} m 



exp(2/i fe ) 



^ £ 2 -m J- G ^Y k) E# [((h n - h*,U k ) 2 x + (hn ~ h m , U k ) x ) 

k=i i?e{-i,+i} m 



x 

rn 



l{^>exp(2 Mfc )} 



^ ^ 2 - m £ - ^ w> « fc &P* (^f 1 ^ ex P (2 w) ) 

k = l ^ e {_l j + l}m ' # 



exp(2^ fe ) 2 
3 Tfc- 



fe=i 

We choose 7^ = (Ton -1 / 2 ^^ 2 )] -1 / 2 such that = — 2M and then pick the largest m ^ 1 
such that £™ =1 "fc7fc < R 2 - 
This gives the lower bound 



m 

T 2 



inf sup E h [\\h n - hf x ] > inf sup E h [||^ - > j^fe^- 1 ^[v^" 2 )]" 1 
where m is largest possible with YlT=i u k"^k ^ -^ 2 ' i- e - 

m 

k=l 

(1) (mildly ill-posed case): When (p(t) = t a and v k x fc e for some a, e > 0, we have asymptoti- 
cally as n — > 00: 



_„2er+2ea ^ ^,-l^ ln 2er+2ea+l 
fc=l k=l 
Hence, choosing m x n i/(2er+2ea+i) we obtain ^ e asymptotic lower bound 



S n X n- 1 E^K~ 2 )] -1 ~ ^m 2 ^ 1 X n -2r/(2r+2a +e -^ 



fc=l 
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(2) (severely ill-posed case): When (p(t) = exp(— t a / 2 ), Vk x k e for some a, e > 0, we have 

m m 

" 1 5^ff r [y(^ a )]~ 1 = n- 1 ^it 2er exp(A; ae ) x n" 1 ™ 2 "" exp(m ae ) 



— i 

n 

k=l fc=l 



means that we have to choose m = clog(ra) 1//ae with a sufficiently small c > 0. The resulting 
lower bound is 

m 

S n X n- 1 J^^k 2 )]' 1 ~ n ~ l exp(m ae ) X m- 2er x (logn)- 2r / a . 
fc=i 

□ 

Proof of Proposition \4-3[ We have 

E[%] = E[(A7i)(W^((^TV)G^)] = (if/i, (iTTV)w/ = (h,u k )x 

and 

Var(%) = ivar (^Kh^W^iK^u^Wi) + V t {(K*)- 1 u k ){W i )) 

< 2n- 1 (||^||LE[((A*)-V) 2 (^)] +E[^ 2 ]E[((K*)-V) 2 (^)] 
^2n- 1 (5 2 + (7 1 2 )||(K*)- 1 Ufc || 2 y . 

From HJif^llw > c||[^(5- 2 )] 1 / 2 5 || x for all g £ L 2 X we infer by duality IKK*)- 1 ^^ < 
c- l \\[ip{B- 2 )\- l / 2 g\\x for all p G ran(iT*). Hence, 

m oo 

lE/ l [||A n -/ i || 2 ,]^2n- 1 (5 2 + ( 7 2 ) C - 2 ^[^- 2 )]- 1 + Y, <M*&. 

k=l k=m+l 

From h G J4?{r, R) we have the bias estimate 

oo 

E < KSi* 8 - 

fc=m+l 

When choosing m as for the lower bound, then the variance term matches the lower bound 
in order and the estimator h n attains the minimax-rate provided the bias term is not 
of larger order. This is equivalent to requiring for some uniform constant c > that 
u m+i nl Y^k=i[ l fi( 1/ k' 2 ^ 1 ^ c ' w hich in turn follows from u m+ i ^ U}~ for k ^ m and 

(1) For mildly ill-posed case with (p(t) = t a , X k e , we have 



m 

ii ■ > \<p(v.~)\ ~=n ~ > /e 2ae x n _1 m 2ae+1 x m _2re 



fc=i fe=i 
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by setting m x n i/ (2er+2ea+i) _ xhus we obtain the upper bound: 5 n x m 



2re 

1L ' ' '. 111UO WC UUldlll uiic uppci UUU11U. u n ^ //fc 

2T-/(2r+2a+e" 1 ) 



n 

(2) For severely ill-posed case with ip(t) = exp(— t~ a / 2 ), X k e , we have 

m m 

n^^M^T 1 = n~ l J2 e Mk a£ ) ~ n- 1 exp(m M ) X m~ 2re 
fc=l fe=i 

by setting m = clog(n) 1//ae with a sufficiently small c > 0. Thus we obtain the upper bound: 
5 n xm- 2r£ x(logn)- 2r / a . □ 

Proof of Theorem \5.3l Given Assumption 15. II ({ipk} is a Riesz basis associated with the oper- 
ator B), there is a bounded invertible operator B on L x such that Bipk = u k f° r an This 
implies that Ti. m f n ) = span-fm, ...,u m r n \}. Denote Il m r n \(h) as the projection of h € Jt?(r,R) 
onto T~l m ( n ) ■ Then 

\\h n -hf x ^ 2{\\U m(n) (h) - hf x + \\K - U m(n) (h)\\ x }. 

As in Blundell, Chen and Kristensen (2007), we define r n as a sieve measure of ill-posedness: 



. . \x ri x 

T n := sup — = sup 



heH m(n y.h^0 \\Kh\\w h&pa,n{u u ...,u m{n) }:h^0 \\Kh\\w' 

which is well defined under the conditions for identification. Then 
\\hn Hm(n) (h)\\x ^r n x \\K[h n - n m ( n )(/i)]|| w . 

Under Assumption \5.2\ by the definition of h n and applying Claims 2 and 3 in Blundell, Chen 
and Kristensen (2007), we have: 



lift* - U m{n) (h)\\ x < r n x {O p (J" rK + VWn) + - n^)^)])^)}, 

where the O p () holds uniformly over h € J4?(r,R). 
By definition of r n we have: 

INI 2 

T n < SU P ||r^ R -2^V2/,||2 < ^("mCn))]" 1 ' 

where the first inequality is due to Assumption l4.2l (the reverse link condition), and the second 
inequality holds because is increasing in k and ip(t) is non-decreasing function in t ^ 0. 

By definition of r ra we have under Assumptions 12.11 12.21 14.21 and lim n = c € (1, oo) and 
J n > m(n), we obtain: 



T:\\K\h-U. 



m(n)(h)]\\w < \\h - n m ( n )(/i)|| x ^ # ^m(n)+l' 
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thus 

uniformly over h £ Jif(r, R) except on an event whose probability tends to zero as n f oo. □ 
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